Automated Knowledge Discovery from Simulators
نویسندگان
چکیده
In this paper, we explore one aspect of knowledge discovery from simulators, the landscape characterization problem, where the aim is to identify regions in the input/parameter/model space that lead to a particular output behavior. Large-scale numerical simulators are in widespread use by scientists and engineers across a range of government agencies, academia, and industry; in many cases, simulators provide the only means to examine processes that are infeasible or impossible to study otherwise. However, the cost of simulation studies can be quite high, both in terms of the time and computational resources required to conduct the trials and the manpower needed to sift through the resulting output. Thus, there is strong motivation to develop automated methods that enable more efficient knowledge extraction. Unlike traditional data mining, knowledge discovery from simulators is not limited to a static, pre-determined dataset; instead, the simulator itself can be used as an oracle to generate new data of our own choosing. We exploit this opportunity by employing active learning and support vector machines (SVMs) to choose which are the most valuable simulation trials to run next. On two realworld scientific simulators, one for asteroid collisions and one for magnetospheric modeling, we demonstrate twofold and sixfold reductions, respectively, in the number of simulator trials required to achieve a particular level of fidelity in landscape characterization as compared with a standard grid-based sampling approach.
منابع مشابه
Survey on Perception of People Regarding Utilization of Computer Science & Information Technology in Manipulation of Big Data, Disease Detection & Drug Discovery
this research explores the manipulation of biomedical big data and diseases detection using automated computing mechanisms. As efficient and cost effective way to discover disease and drug is important for a society so computer aided automated system is a must. This paper aims to understand the importance of computer aided automated system among the people. The analysis result from collected da...
متن کاملA Data Mining approach for forecasting failure root causes: A case study in an Automated Teller Machine (ATM) manufacturing company
Based on the findings of Massachusetts Institute of Technology, organizations’ data double every five years. However, the rate of using data is 0.3. Nowadays, data mining tools have greatly facilitated the process of knowledge extraction from a welter of data. This paper presents a hybrid model using data gathered from an ATM manufacturing company. The steps of the research are based on CRISP-D...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملKnowledge Development Methods for Planning Systems
Success in applying AI-based planning systems to real domains requires sophisticated methods of knowledge acquisition. Both interactive and automated methods are required: interactive methods to aid the user in entering planning knowledge; and automated methods to verify the interactively developed knowledge and extract new knowledge from a variety of sources, induding simulators, on-line datab...
متن کاملOntologies Application to Knowledge Discovery Process in Databases
Nowadays one of the most important and challenging problems in Knowledge Discovery Process in Databases (KDD) or Data Mining is the definition of the prior knowledge; this can be originated either from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypothesis, represent the output in a more compre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006